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ABSTRACT 


This thesis reports on a simulation study of parametric and nonparametric 
procedures for obtaining confidence intervals for the logarithm of the probabilitv a 
semi-markov process enters a particular state before a fixed ume t. Three estimators 
and confidence interval procedures are proposed and compared. The different 
estimators use different amounts of information about the process. The maximum 
likelihood estimator and its normal confidence interval procedure uses the most; the 
estimator based on the empirical distribution function of the observed first passage 
times uses the least. An estimator based on an exponential approximation to the 
survivor function of the first passage time uses an intermediate amount of information: 
confidence intervals for the last estimator are obtained using jackknife and bootstrap. 
procedures. The maximum likelihood procedure is the most efficient if the underlying 
model is correct. [f the model is not correct the empirical survivor function estimator 
appears to be best for small times and the estimator based on the exponential 


approximation best for large times. 
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I. INTRODUCTION 


A. OBJECTIVES 

Finite state space semi- Markov models find application in a variety of areas such 
as queueing theory, reliability, and clinical trials. The application of these models often 
centers on the distribution of a /irst-passage time to a state or a set of states 
representing for example the lifetime of a svstem or marking the end of a busy period 
for a server. Suppose that the observations of the path of the semi-Markov process are 
all that is known about the process. 

This thesis reports the results of a simulation experiment to compare various 
parametric and nonparametric estimators of the natural logarithm of the probability a 
semi- Markov process does not enter a particular state before a given fixed time t. In 


7? 


what follows we will use “In probability” for ” natural logarithm of the probability ”. 
The specific semi-larkov model and estimators considered are given in Chapter II. 
Chapter [II contains the details of the simulation experiment and results. Conclusions 


from the study are given in Chapter IV. 


B. SCOPE OF THE THESIS 

The purpose of this thesis is to use simulation to compare estimators and their 
confidence intervals for the In probability a particular semi-Markov process does not 
enter a particular state before a given fixed time t. The particular semi- Markov model 
and estimators considered are given in Chapter II. A simulation study comparing bias 
and standard errors for these estimators was reported in Gallagher (1986) [Ref. I]. In 
this thesis, we are primarily interested in comparing confidence interval procedures. 

An estimation procedure which uses the least information about the semi- 
Markov process uses the empirical distribution of the observed first passage times. An 
estimation procedure which uses the most information is to assume a parametric form 
for the sojourn time distribution in each state and a transition matrix to describe the 
transition between states; the parameters of the sojourn time distribution and 
transition probabilities can be estimated using maximum likelihood. An estimation 
procedure requiring less information uses nonparametric estimators of the sojourn time 


distributions and the maximum likelihood estimators for the transition probabilities. 
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In many cases, parametric assumptions concerning the sojourn time distribution 
are difficult to justify. It was demonstrated in Park (1986) {Ref 2] that incorrect 
parametric assumptions may lead to very biased estimators. Hence. a nonparametric 
estimation procedure may be preferred to a parametric one when actual data is used. 
However, the nonparametric procedure can be expected to be less efficient than a 
parametric one provided the parametric model 1s correct. 

The thesis is organized as follows. In Chapter [I, the nature of the problem is 
described and several parametric and nonparametric confidence interval estimation 
procedure are introduced. In Chapter III, the simulation experiment is described and 


results are given. Conclusions drawn from the simulation study are given in Chapter 
RY. 


1h 


Il. NATURE OF THE PROBLENI 


A. DESCRIPTION OF PROBLEM 

The semi-Markov process model used in the simulations to compare the 
estimators is as follows. 

Suppose we observe N individuals. Let X, (1) be the state of the i-th individual 
at time t. We will assume ht (i), t2 O} 1 = I, 2,...., N, are independent semi- 
Markov processes with three states { 0, 1, 2 } having the same distribution as (Ny 0S 
O}. All individuals start at t= 9 in state 1,(S, = 1). An individual stays in state {en 
a random length of time having distribution function F,. Upon leaving state |. the 
process transitions to state 0 with probability 9 and to state 2 with probability 1-8. 
If the process transitions to state 2, it spends a random length of time there having 
distribution function Es. inemistate 7. (ie process transitions to state | with 
probability |. State 0 1s an absorbing state. Once in it. the individual never leaves. 

For all individuals, the entire path of transitions and sojourn times are observed 
until the time of absorption in state 0 (Fig. 2.1). 

eek 


D=inf (t2 0, X, =}, 


be the entrance time to state 0 (or the time of death). 
The problem is to estimate the logarithm of the first passage time survivor 


function P {D>t} for fixed time t from the data obtained by observing the N 
individuals. 


B. THEORETICAL DEVELOPMENTS 
Two estimators for P{D>t} will be described in this section. They are the 


maximum likelihood estimator and the asymptotic renewal estimator from a paper by 
Peet) AC OOS Rel mies 2c 
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Figure 2.1 Transition State Step (Sojourn Time). 


1. Nfaximum Likelihood Estimate for Continuous Time Markov Chain 
In this subsection, the maximum hkelihood estimator for P(D>t} will be given 
for the special case in which the sojourn time in state 1 is exponentiailv distributed 
with mean a al 2 )- 
Let R be the number of transitions from 1! to 2 for one individual. The log 


likelihood function for the individual is 
L = Rin(1-6) + In0d + Rinp, (eqn 2.1) 


ieee) np pt - pt, 


where T; (1=1,2) is the total time spent in state 1 before death . 


The maximum likelihood estimators are 


§ ( 2.2) 
= ——— eqn 2.2 
I+R : 
coe 6b RR 3) 
Py T, Sani 
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eqn 2.4) 


Further R is geometric {8} so E{R} = (1 - @)/@. Since the N individuals are 
independent, the maximum likelihood estimators using the data for all N individuals 


are 
A N 
Coe (eqn 2.5) 
eee 
A een 
Mite eam (eqn 2.6) 
vy 
A R 
p, = oa s (eqn Zen) 
qT; 


~ ~~ 
where R is the total number of transitions from | to 2 for all N individuals and T. 1S 


the total time spent in state 1 for the N individuals. For the estimators based on data 


for all N individuals. 


A ae 
ace eet) (eqn 2.8) 
. -} 
Var (p,)} = 1 (p,) (eqn 2.9) 
: -| 
var (p,} = I (p,) (eqn 2.10) 
where 


(eqmie2, ul) 








l 
(8) = NE (ay + 


|) eek 


lp) = NE(————} (eqn 2.12) 
Py 
; I 


rd 


with N being the number of individuals. 


Let D be the time of entrance to state 0 for an individual. Fix t and put 


Se ot) = PYD> t}: 


then 
At i, + Q 
s(t) = (22 Exp (a, t)- EXP(A, t) ae (eqn 2.14) 
os ao Oe A, hi-hs 
where A,, A, are the roots of the equation 
eats iP; + P>) + y- = 0. (eqn 2.15) 


A parametric estimator for S(t), the survival function, is 


A A A ae NOM 
P Si — ne: hat) - EXP(A,t) Biel ee 2.16) 
M a AP(A.t) - E. iy ee, i eqn «. 
he hy Links 
where A, and A, are roots of the equation 
ANA A A. ; 
cepse y (py) + 5) + y- = 0. (eqn 2.17) 


Since the maximum likelihood estimators are uncorrelated, the asymptotic 
A 


A 
variance of Py, {D> t}, which will be denoted by Vary. is approximately 


A A 
Vary (Py De t}/0.p,.p.3 (ean) 2 aks) 


Is 


= 
- 


= 


S A Bs = A x Nor ~ 
= Var {8} (6S: 68)." + Var (p,)} (es’ep))> + Var (p5t (0S '¢p,) 


















































where 
a 7 

Ss. Ss 35 eee (eqn 2.19) 
30 9 OA, a0 JAS oo 

_ oa bak ee) 

z oa haa eee 22a = (eqn 2.20) 
J0, P. OA, 20, OAs I, 

35 a 4% | | AzS 2) ae eee os 35 ee _ ¢eqn 2.21) 
205 ~AyA5- As A, OA, 90. 9A5 785 
4 90 D 9 

e ae para aa ane Sea 
Si sca tee = a eh = ae t}ee*?” ] (eqn 2.22 

vain. 2 crn 
as S Sea 2 a eee \  A3o4y (ean ae) 
AW Lee oe * 5 ie 

2 le 2 2 n 2 

2 

aA, 79.9. (eqn 2.24) 
a9 Py 757A, 
ok, 2 90. (eqn 2.25) 
2 - +? 
2 ea 
ue on 00, (eqnelse6) 





9p. Pr PateA, 
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The asymptotic variance of In Py {D>t} = vant} i (Pry pet). 
2. Asymptotic Renewal Estimator 

This subsection descrives an estimation procedure for P {D> t}, which is based 
on an exponential approximation to the survivor function P{D>t}. Details of the 
approximation which is obtained by asymptotic renewal theoretic results is given in 
Jacobs [Ref. 3]. The approximation improves as t becomes larger. | 

Lee 9 = N/(N +R) be the maximum likelihood estimator of @ (equation 
2)5 Let 0:(@) be the empirical transform of the sojourn time in state i ; that is, if 


S (1), S3(1),..., Sy (1) denote all the sojourn times in state i for the N individuals, then 
a + i 
M. 
A i 
@: (a) = (1/M,) ) EXP (aX Sy (i) ). (eqn 2.27) 
k=1 


The asymptotic renewal estimator of the survival distribution [Ref. 3] is 


A A A 
Px{D>t} = (b/p) EXP(-K Xt). (eqn 2.28) 


A 
where K is the solution to the equation 





A ‘ A A 
f(a) = (1-9) (a), (a)-1= 0. (eqn 2.29) 
M, M 
Pa) ore (1)+ $.(2)} EXP#x(S (1)+$.(2))} (eqn 2.30) 
ut Vi xVM Oo OK aaa k woos qn 2.97 
ee a 

M, 

A A A 

©, (K) = (1/M,) } EXP («kx S,(1) ). (eqn 2.31) 
hy 

and 
A 0 A A z 
b = (O/K) )(K). (ect 22512) 


Note that when a@=0, the left hand side of the equation (2.29) is- 0< 0. As 
@ increases, the left hand side of the equation increases to “©, thus there is a unique 
solution Kk. The solution may be found by numerical search. One possibdle numerical 
search procedure is the golden section search method. 
3. Empirical Distribution estimate 
Let d,, d,, hee . dy; denote the observed times of absorption in state 0 for the N 


individuals. A binonual estimator for the survivor function 1s 


N 

A AS 

Pp{D>t} = (1/N) ) I(¢,00) (4) (eqn 2.33 
a | 


where lve 20) (d.) = 1 if d.>t, 


Q otherwise. 


C. CONFIDENCE INTERVAL PROCEDURE 
Suppose X is a random variable whose probability law depends on an unknown 
parameter 9. Given a random sample of X: x), X5...-.. X, the two statistics lower (L) 


and upper (U) forma 100(1 - @)% confidence interval for 9 if 
Pils 0's Orewa 


Procedures to ootain confidence intervals for the point estimates for In P{D>t} 
are given below. 
1. Maximum Likelihood Estimator 
The maximum likelihood estimator is asymptotically normal as the number of 
individuals N becomes large, [Ref. 4]. For a fixed t, the maximum likelihood 
confidence interval for In P{D> t} is 


\ LS AN 
(L,U} = In Py{D>t} £2, gio (WVaryy) / (Py (D<t}). (eqn 2.34) 


A 
Where Vary, is obtained by using the maximum likelihood estimator of @, p,, and p, in 


equation (2.11) - (2.26). Confidence limits that are larger than 0 are set equal to 0. 
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2. Empirical Distribution of First Passage Time ( Binomial and Normal Approx. 
Confidence Interval ‘fethods ) 
a. Binomial Confidence Interval 
Since the N individuals are assumed independent, the estimator Pal D> t} 
of equation (2.33) has a binomial distribution with N trials and probability of success 
P{D>t}. The IMSL routine BELBIN was used to obtain binomial confidence intervals 
fore) —t}. For a description of the procedure see [Ref. 5: p.390 - 391]. The binomial 
confidence interval for In P{D>t} is obtained by taking logarithms of the upper and 
lower confidence limits. If the lower confidence limits for Pp Ort) 1s cqwaler@ 0, it 1s 
set equal to 0.0001. 
b. Approximate Normal Confidence Interval. 
If the number of individuals N is large, the binomial confidence interval 
procedure can be approximated by a normal confidence interval procedure as.follows, 
[Ref. 6: p.99 - 100 , p.954 - 955]. The interval is 


A ————— 
mo U } = PaiDet} £2, _4,.V Varp (eqn 2.35) 


A “ A 
Varg = (1/N){ Pg(D>t}x(1- Pg{D>¢} )} 


This interval { L , U } is an approximate 100(1 - @)% confidence interval for P{(D> t}. 
ieiemincerval for In P{D-=t} is { In L, In U }. If either L or U is less than 0, it is set 
equal to 0.0001. If either L or U is greater than 1, it is set equal to I. 
3. Asymptotic Renewal Procedure 

In this subsection, two nonparametric methods are described for obtaining 
confidence interval using the estimator Ps Det) eo Denis the qackknife. The other is 
the bootstrap. They are described below. 

a. Jackknife Estimation Method 

The jackknife was first introduced by Quenouille (1956) [Ref. 7], for the 

purpose of reducing the estimate bias, and the procedure was later utilized by Tukev 
(1958) [Ref. 8], to develop a general method for obtaining approximate confidence 
intervals. 


The basic idea, of the jackknife estimation method is to assess the effect of each 
of the groups into which the data have been divided, not bv the results for that 


1 


roup alone, but rather through the effect une the bodv of data that results 
rom omitting that group. Thé two bases of the jackknife are that we make the 
desired calculation for all the data, and then, after dividing the data inte groups, 
we make the calculations for each of the slightly reduced Bodies of data obtained 
by leaving out just one of the groups. [Ref. 9] 


The jackknife procedure is as follows: 


A 
(1) Let Y,3, (t) be the estimate P,(D>t} computed using all the data. 


(2) Let Y; (t) be the computed statistic using that data which omits the jth 


J 
subgroup where j = l, 2, ....., k. In this thesis, for the case in which the 
number of individuals N equals 20 the jth subgroup consists of all data 


h individual. For the case in which the number of 


corresponding to the i 
individuals N equals 50 the first subgroup consists of all data corresponding to 
the first 5 individuals. the second subgroup of the second 5 individuals etc. 


th subgroup consisting of the jth 


Some cases for N = 50 were run with the ] 
individual. The resulting confidence intervals differed little from those obtained 
by leaving out 5 individuals at time. Computational considerations lead us to 
use the fewer number of subgroups in this case. 


(3) SF. Define the jth pseudo-value by 


Yai(t) = k In Yqy(t) - (kel) In Y(t). (eqn 2.36) 


(4) The jackknife estimator Y.,(t) 1s 


Ya(t)= (1/k) oS (1) = Yon Bees oe ea (eqirees 
2 k 
(5) The jackknife estimator of the variance of Y» (t) is 
k 
Se7(t) = (1/(k (k-1))} ¥ (Yoi(t) - Viale. (eqn 2.38) 
jae! 


Tukey(1958) proposes that in a wide variety of problems the k estimated 
pseudo-values can be treated as approximately independent and identically 
distributed random variables [Ref. 8], to obtain the following confidence 
interval procedure. 


(6) The jackknife confidence interval is computed as follows. 
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( L,U} = Yelt) = ty gio VSe(t) (eqn 2.39) 


where th -a/2 is the upper 1- @/2 critical point of the t-distribution with k-1 
degrees of freedom. The confidence interval given by equation (2.39) is a 
function of the estimated variance. If either confidence limit is greater than 0, 
iis SetrequalghonO: 

b. Bootstrap Estimation Method 

Efron(1979) introduced the bootstrap method for estimating the 
distribution of a statistic computed from observations, [Ref. 10]. In this thesis, the 
bootstrap was implemented as follows. 

Suppose data are gathered for N individuals. Let R,.(n) be the number of 
transitions from state | to state 2 for individual n. Let {S,(i)} be the collection of all 
sojourn times in state 1 for all individuals. A bootstrap replication is generated as 
follows: To generate data for one individual, one observation is ‘drawn at random with 
replacement from ({R,.(n)}; call the observation r,,; 1,,+ | observations are drawn at 
random with replacement from the collection of state | sojourn times (S;(1)}; r,, 
observations are drawn at random with replacement from the collection of sojourn 
times in state 2, (S:(2)}. This procedure is replicated N times to generate bootstrap 
data for N individuals. The estimator In Ps {D> t} is computed using the generated 
data. This completes one bootstrap replication; B bootstrap replications are done. 
The B estimates of In Py {D>t}are ordered. A 100 X(1-@)% confidence interval is 
constructed using the @/2 and | - @/2 quantiles of the bootstrap estimates of In Ps 
(D>t}. If either confidence limit is larger than 0, it is set equal to 0. [n the 


simulations the number of bootstrap replications is B = 100. 
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ii. ANALYSIS OF THE CONFIDENCE INTERVAL PROCEDURES 


A. DESCRIPTION OF SIMULATION 

A Fortran program is written to simulate the semi- Markov process and compute 
the confidence interval. All simulations are carried out on an IBM 3033AP Computer 
at the Naval Postgraduate School using the LLRANDO™M II Random number 
generating package [Ref. 13}. 

The data for the simulated experiments are generated as follows: 

(1) An individual starts in state 1 at time 0. 

(2) A random number with distribution F, is generated for the sojourn time in 

slaten. 

(3) A uniform random number is generated. 
[f it less than 9. the process transitions to state 0 and the data for one individual is 
complete. If it is greater than 9. the process transitions to state 2. A random number 
having distribution function F, is generated. The procedure then returns to step 2. 
Data are generated for N individuals and are collected as follows: 

(1) The passage times to state 0 for each individual; 

(2) The sojourn times in states | and 2 for each individual: 

(3) The number of transitions from state 1 to state 2 for each individual. 

For each replication of the simulation the estimates and confidence interval in 
Chapter II are computed for In P{D>t}. Each simulation is replicated 300 times. The 
true In P{D>t} is computed. For each procedure, the number of confidence intervals 
covering the true value is recorded. 

The number of individuals that are too low ( true In P{D>t}>U: the upper 
endpoint of the interval ) and too high ( true In P{D>t}<L; the lower endpoint of the 
interval ) are also recorded. The average length of the confidence interval is computed 
as well as the standard deviation of the lengths. The standard deviation is computed bv 
subtracting the mean length from each length, squaring the results, summing over the 
300 lengths and finally divided by 299. 
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B. SIMULATION RESULTS FOR THE EXPONENTIAL MODEL. 

In this section resuits will be reported for a simulation experiment in which the 
sojourn time distributions in both states are exponential; F, is exponential with mean 
l/p, and F, is exponential with mean I/ p,. Some true values for P(D> t} for this case 
can be found in Appendix A. 

For each replication nominal 80%, and 90 % confidence intervals for In P{D> t} 
for various values for t are computed using each procedure of Chapter II. The times 
considered are t =0.5, 1.0, 1.5, 2.0, 3.0, 4.0. 

The confidence interval procedures for In P{D>t} are the binonual confidence 
interval (= BIN ), and it’s approximating normal confidence interval (= NOR ), for 
the In fraction of individuals who have not entered state 0 bv time t: the maximum 
likelihood confidence interval (= MLE ), jackknife confidence interval for the 
asymptotic renewal estimator (= J.K ), and bootstrap confidence interval for the 
asymptotic renewal estimator (=. B.T ). For each procedure the number of intervals 
covering the true value of In P {D>t} is recorded as well as the number of intervals 
that are too high or too low. These results are reported in Tables 1, 5, 5 and 7. Next to 
each coverage count are given the corresponding coverage proportion in parenthesis. 
Below the coverage numbers is given a 95% confidence intervals for the coverage rates. 
This interval is computed as follows. If P ts the proportion of (1-a@)% intervals that 


cover the true value then a 95% confidence interval for the coverage rate ts 
mei eu} = P = 1.96, P (1- P )/300. 


If a (1-@)% confidence intervai procedure is performing well, then this interval should 
cover about (1-@)%o of the time. In addition, if a $0% (respectively 90°) confidence 
interval procedure is working well. then out of 300 replications between 226 
(respectively 260) and 254 (respectively 280) confidence intervals should cover the true 
value. Simulations for which the numbers of intervals that cover are outside of these 
Pemmeseare given in bold type. in the first column of Tables 1, 3, 5 and 7 the true 
P{D> t} 1s given in parentheses. 

In Tables 2, 4, 6 and 8 are recorded the average length of the confidence intervals 
for In P{D>t}. The estimated standard deviation of the length is below the average 
length in parenthesis. If a procedure is performing well, it should not only have the 
correct coverage rate but also a small average length. The simulation results recorded 


in Tables | - 4 are for a simulation whose parameter values are p, = 1.0. p,=10.0 and 


23 


§=().5. The number of individuals N is set at 20 and 4), representing a low and 
moderate number of individuais. 

Coverage results for N = 50 individuals are presented in Table 1. The binomial 
confidence interval procedure tends to overcover. The maximum likelihood procedure 
has the right coverage for all cases. The two confidence interval procedures using the 
asymptotic renewal estimators undercover for tS 1.0 and have the right coverage for 
t2 1.5. When a jackknife confidence interval does not cover it 1s often because it is too 
high (true value < L ). 

Sample means and variance of the confidence interval lengths, for N = 50 
individuals, are reported in Table 2. The average lengths for the maximum likelihood 
and bootstrap are very close for t larger than 1.0. The bootstrap and jackknife 
procedures have larger average confidence interval lengths than those for the maximum 
likelihood estimator but smaller than for the binomial procedure. 

Results for a simulation with N = 20 individuals are given in Tables 3 and 4. 
Thev are similiar to those in Tables | and 2. However. the average length of the 
intervals is larger in Table 4 than in Table 2 reflecting the smaller number of 
individuals. 

Results of a simulation with the same parameters as the first but with the 9= 
2/3 are given in Tables 5 - 8. Results for N = 50 individuals are given in Tables 5 and 
6. The coverage results are in Table 5. The binomial procedure has better coverage 
than in Table 1. The procedure based on the asymptotic renewal estimators have the 
correct coverage only for t22.0. The average length of the intervals in Table 6 are in 
general longer than the lengths in Table 2. 

Results for a simulation with 8= 2’3 and N = 20 individuals are given in Table 
7 and 8. The binomial confidence interval tends to overcover when the true probability 
is greater than 0.5241 or less than 0.2753. Once again the confidence intervals based 
on the asymptotic renewal procedure undercover for t< 2.0. The maximum likelihood 


confidence intervals give the correct coverage. 
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TABLE 8 


1.0, p, = 10.0 


1 FOR In P(D=t} (EXPO Ea A EE MODEL) 
= 20, 09=2/3, p= 


C. 
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LENGTEEOG 


NOR MLE J.K Ban 
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time 


ROBUSTNESS 


oF 


The robustness of the estimates in Chapter II was studied with respect to an 


In 


incorrect model assumption about the distribution of the sojourn time in state 1. 


the previous simulations, the maximum likelihood estimator used the known correct 


model. 


The data for the simulation experiment in this section is generated from the 


Individuals start in state 1 at t=0. The 


mu- Markov process: 


following three state se 


probability of a transition to state 9 is 8, and to state 2 is | - @. From state 2, the 


State 0 1s atiWalesorbing state the sojourn 


probability of a transition to state | is 1. 


CS 
cr) 


time in state 2 is exponential with mean 1/p,. The sojcurn time in state 1 is the sum of 
two independent exponentials with means 1/p, and 1/p,; that is, the sojourn time in 
state | has a hvpoexponential distribution. The same Fortran program for the 
Simuiation is used but slightly modified for the above change. The data generated are 
analyzed by the same Fortran programs for each estimator as in the first section. In 
particular, the maximum likelihood estimator assumes the sojourn time in state 1 has 
an exponential distribution rather than the true hypoexponential distribution. 

| For the first simulation results reported in Tables 9 to 12, parameter values of p, 
= 2.0, p, = 10.0, p,; = 2.0, and 9 = 0.5 are used. Again two different numbers of 
individuals are used; they are 20 and 50. The simulation is replicated 300 times, the 
coverage numbers and the average lengths of the confidence intervals are computed. 
The actual value of the survival function is computed by inverting the Laplace 
transform of the passage time to state 0 for the semi- Markov process. Some computed 
values of the survivor function can be found in Appendix B. 

Table 9 reports coverage results for the case N = 50 individuals. The use of the 
incorrect model for the maximum likelihood estimator leads to the majority of the 
intervals being too low for t=0.5 and t=1.0. The binomial confidence interval has a 
slight tendency to overcover. The confidence interval based on the asymptotic renewal 
estimator undercover for t=0.5 and t=1.0. For larger values of t they have the 
correct coverage. The results for the jackknife and bootstrap individuals are similar to. 
those in Table 1. Table 10 reports the average lengths of the confidence intervals. The 
average lengths for the maximum likelihood procedure are similar to those of Table 2. 
For the other procedures, the average lengths are smaller. 

Results of the simulation experiment for N = 20 individuals are given in Table 
11 and 12. Table 11 shows the coverage results. The maximum likelihood procedure 
undercovers for t=0.5 and t=1.0 and overcovers for t=1.5 and t=2.0. The binomial 
procedure once again tends to overcover. The procedures based on the asymptotic 
renewal estimator have the correct coverage for all t except t=0.5. The average lengths 
of the intervals are given in Table 12. 

Tables 13 to 16 report results of simulation of a semi-Markov model with @= 2/3 
but with the other parameters the same; p, = 2.0, p,= 10.0, p;= 2.0. 

Results for N = 20 individuals appear in Tables 15 and 16. The confidence 
intervals based on the asymptotic renewal estimator have the correct coverage for all t 


except 0.5 and 1.0. The results are similar to those of Tables 11 and 12. 


33 


TABLE 9 


Be 


COV ee RATIO ChE 
N= 70,0 — 02 P= 2 ae 


too high 















cover 
too low 





OA-109 ONS 


e * 
Nee ee eer Nee ge eee” 














too high 








NMC} 


= =O Onn! Goaenw SF 
Cat enr MOON aaicnsstr eter) 
| Ss GA << SN — i — 


i i iia ae en, 
Foor aS Oot 
SI~—o ONSeD 
wee ge ee 0 ee eee 


Ol pan om ~~ 

<TENQOGY 21) tt 

DMOmNn AAD 
° a e e cS <a) ° 
way Shen 

—! ¢ 

—oniwdm “oo 





— 


cover 
too low 
int.val 


34 


TABLE 10 


) 


YPOEXP MODEL 


10.0,p3 = 2.0 


4496 
(O.415) 





|.6667 
(1.017) 


——md 


90 % 


go 


B.T 


oe 


DEL) 
= 70 


0, p, = 10.0, p, 


HYPOE NR 


= 


TABLE 11 


E RATIO 
0.5, Py 


BIN 


8 


oUF 


COVERA 
VN = 


length 


‘time C.1(%) 


i 





Toor; oarmna or~c | ClO ote | COMRY Site 
Cry Si-—09 | Stace Cea | Sos) ae 
a Seen s. pee || eee eee | ee a Came, On|, ee, ee 
CNCOHNEN CooOCctearny ~OCOnn Sennen TR ONTT ON OST Te 
=~ Or . — (Int a —AytAnt~ Crom] —Arnrm Soeno 
Ct <2 E ’ GA’ ae : Ga ee ON Se CN ne cx se 
CMa ‘ CNOVAVLO : Cit So SGiey | (oa ao, ‘SOCMaar, 
00ND | 09909 OO 09900 SO} 00900 GHSA| 0909 SATA 
Nee 8 ° a = ee | eee | Meo) Il Rene gt RA ee 
NOON" T = ONO OO T~—OYON ONE DD OO ON SS ON 
NOM = cot Ir nine =m Tol nna —t~ To 
Gr) os ; S ‘i Lo : z | a cs Sy N oF on — 
o OO ON, on Np. SN _— 
—Nrm ON : CMM on \CeCie Oo INCI“ Oo Om ay 
- SoO—= On : —=NOn SASH Selo Ciao. 
OTN : arity PONS OT TON ONE OD 
Prornea COTO eT tReet tT 09 
cyt Saat oS Ole clr 1G mame 
NN ya om Venn CN in a 
WCU) : Teor = ZNOONDO TL ANINIMNOoO TANEOe 
DRO | DoO—o SK Bnc100 SINC loo 
eS & ° . te . . . * ci ° . . e ° . © . A e . - @ ° ° Ss eee a 
Wo TOo CNOQNCODEN —fT-CON | —COO -nHa™— 
—t~ 09 gt — rent Mnno] Tarr Clenenm 
ey, / : Paton Sa ay OE Le fa 
—thicnt~ : . Slt oo SO Oey a a 
AROS DAGON Anan] TASH N= ON 
Gay e e is . e . a e ° a . e . a ® os = e e . ° >) 2 * c CS e . a | ed e 
o we . 4‘ ° ° ee ene o weetne” SS ie ee aren ge 
IM Coen ¢ ~ A COC OO: Se a ne ee 
‘NWN | Cc == MM DH! Treo Tarn 
: Oc; z ae : : Ct 4 a CY. cs Clg 
S Se foe ewe 
fh 2 S OOS] NOs MMOD 
OOS ot “OOS emt YY > 
Poe ae en eae ae eo: ae 
© a= Gig Oro = Gree 
oe [P00G] S00 F0o0k 
@ ao O ane oO ay oe oO aoe 
wv ~ ~ ~ 
SF ad 
o~ o- 
cS 2} 
On aa 
on 
a 
Cc 
<. nN 
cm ~~” 





36 


TABGE @2 


) 





90 % 


ayy 


TABI als 


) 


Is 
10.0,p3 = 2.0 


- 


oe RATIO COE MODE 
N= 50, = 2/3, pP, = (0; p> 


CO 








(RO pet eer 
DOD tae 
CAM Ow MOOI 
meee 2) | eee eee = 
HAUYOa Chee 

MeImt~ envol-a 
3 CY = (ee 








ti een, NN te OO rt, 
CVOown 'Nvrlrecriy] 2nTteqH GTon— 
—~“OO—0D O00 DA | OPmoan OOD ON 
DOea WIND | Toss ANC T 
ere OMI] NNT SOCIO 

Cie ol <* oC 


CO 
[~OO—a Ato 





Senet Ae 
CIDA sm—enen 
cieimtr Ovorna 


Gi * TS 





B.T 











=~ . aaa 

Woon Coarnnl ane ora] cinine 
moo CASH 090g ORLA 
ISO OHS | MHOC COATS 00 
ter tt M00 J enter MOD 


=— 2 = Ca eee 











J.K 





tot —_— tC 
GaN gn ae ae ee 











=~ : 
(NO eee OI OE 


Ne ga, 
“FIN M™MOATe nt. = nla on 
ie OO ee ON 


ST 
QRS BACH] SMOA SATA 


MLE 


NOR 








OMNQD  MeNeNeny~ = Gly 
=09969  QHDOD oe 


e ° ee nee 
CONIC ID 





BIN 


e. a ee ee 
ono Oc 
(tM 04 = OD = ON 

e on e 





ee 

& Ws OS 

we =O Os 

ob as 
Ooo c 

a oV7%O-4 
4+ 


time C.I (%) 


38 


TABLE 14 


) 


HYPOEXP MODEL, 


f 
ae 


OR InP 
2/5,P) 


b=3, 


NOR MLE J.K B.T 


BIN 


C.1 (%) 


time 





39 


ODEL) 
0 


OEXP M 


Ci athe 


2/3, Pp, = 2.0, p. = 10.0,p, = 2. 


TABLE {5 
RATIO 


E 


6:8 


“a 


COVERA 
N = 


time C.I (%) 


: ae 


COI NY 
Or 090 


f ee | eee 


Memon 
Cicgn yr 
ON ° 


See 

O§ 

‘ 

O 

O 

~~ 

Oo 

o~ 

a 

OS 


Cohiod 


> 

_ 
oO 
ore 
YO. 

+ 


Nea 
(OD a 
Ao 
ee eae 
WON 
ra Orr 

oO : 


— 


too high 
cover 

too low 
int.val 





40 


TABLE 16 


) 


10.0,p3 





41 


IV. CONCLUSIONS 


This thesis considers the problem of estimating the log probability for a semi- 
Markov process which does not enter a particular state before time t. 

Simulation is used to study procedures for obtaining confidence intervals for the 
In probability a semi-larkov process enters a fixed state after time t. The data arise 
from observing N individuals. Three estimators and associated confidence interval 
procedures are compared: The three estimators use different amounts of information 
about the process. The maximum likelihood estimator and its normal confidence 
interval uses the most information. An estimator based on the observed first passage 
times uses the least. An estimator based on an exponential approximation to the 
survivor function of the first passage time uses an intermediate amount of information: 
confidence intervals for this last estimator are obtained using jackknife and bootstrap 
procedures. The simulation results indicate the following. 

(1) Larger numbers of individuals result in smaller average confidence interval 
lengths. 

(2) The binomial confidence interval tends to overcover. This is true in general 
since the target coverage probability is a lower bound on the true coverage 
probability. They also tend to have larger average length. 

(3) The maximum likelihood confidence intervals have the correct coverage if the 
model on which they are based is correct. If the model is incorrect they can 
either overcover or undercover. 

(4) Ifa jackknife interval does not cover the true value the interval will tend to 
be ‘too high’. 

(5) The confidence intervals using the asymptotic renewal estimator have the 
correct coverage for largish t. 

(6) The bootstrap confidence interval requires much more computation than the 
jackknife confidence interval. Since the results for the jackknife and bootstrap 
procedures are similiar, ease of computation appears to favor the jackknife 


DIOGeECUne: 
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APPENDIA A 
TRUE PROBABILITY TABLE I 


TABLE 17 
EXPONENTIAL MODEL P{D> TIME}. 


time p, = 1.0 ,p., = 10.0 p, = 1.0,p, = 10.0 
9=0.5 § = 0.6667 
0.0 1.0000 1.0000 
0.5 0.7866 0.7231 
1.0 0.6203 0.5241 
1.5 0.4891 0.3799 
2.0 0.3857 "0.2753 
2.5 0.3042 0.1995 
0 0.2399 0.1446 
3.5 0.1891 0.1048 
4.0 0.1492 0.0760 
4.5 0.1176 0.0551 
5.0 0.0928 0.0399 
5.5 0.0731 0.0289 
6.0 0.0577 0.0210 
6.5 0.0455 0.0152 
7.0 0.0359 0.0110 
7.3 0.0283 0.0080 
8.0 0.0223 0.0058 
8.5 0.0176 0.0042 
9.0 0.0139 0.0030 


The sojourn time in state | has an exponential distribution with mean 1/p,. The 





sojourn time in state 2 has an exponential distribution with mean 1/p,. 
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APPENDIX B 
TRUE PROBABILITY TABLE I 


TABLE 18 
HYPOEXPONENTIALP MODEER FD Tite 


p, = 2.0 .p,= 10.0 = 2.0 p,=2.0] p, =2.0,p.= 10.0 
9. =2.0,0 = 0.6667 
1.0000 
0.8214 
0.5788 
0.3936 
0.2652 
0.1783 
0.1197 
0.0804 
0.0540 
0.0363 
0.0244 
0.0164 
0.0110 
0.0074 
0.0050 
0.0033 
0.0022 
0.0015 
0.0010 





The sojourn time in state | is the sum of exponential distribution with mean | p, 


and mean l/p,. The sojourn time in state 2 has an exponential distribution with mean 
Lp, 
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APPENDIX C 
AVERAGE LENGTH OF C.I FOR P{D>T} 


Recorded in the Appendix are the average length of confidence intervals for 
P{D> t} corresponding to those given in the thesis for In P{D>t}. If a confidence limit 
is greater than I, it is set equal to 1. If a confidence limit is less than 0, it 1s set equal 


to 0. The results are similar to those in the previous tables. 


TABLE 18 
PeENGIH OFS (onl FOR P{D> t} (EXPONENTIAL MODEL) 
N = 50, 9=0.5 5, P, = 1.0, p, = 10.0 
time C€.1(%) BIN NOR MLE Nake B.T 
0.5 80% 0.1651 0.1469 0.0692 0.1259 0.1044 
(0.014) (0.014) (0.007) (0.058) (0.027) 
90% 0.2054 0.1887 0.0889 0.1669 0.1330 
(0.018) (0.019) (0.009) (0.077) (0.033) 
10 80% 0.1915 0.1741 0.1078 0.1278 0.1154 
(0.006) (0.007) (0.007) 7 36) (0.015) 
90% 0.2389 0.2236 0.1384 1694 0.1483 
(0.008) (0.009) (0.009) 0 048) (0.018) 
1.5 80% 0.1967 0.1795 0.1264 0.1381 0.1289 
(0.002) (0.003) (0.005) (0.035) (0.014) 
90% 0.2455 0.2305 0.1623 0.1830 0.1650 
(0.003) (0.003) (0.006) (0.047) (0.019) 
20 80% 0.1913 0.1740 ).1320 0.1427 0.1334 
(0.007) (0.007) (0.002) (0.035) (0.015) 
90% 0.2387 0.2234 0.1695 0.1892 Oe 
(0.008) (0.009) (0.002) (0.046) (0.019) 
3.0 80% 0.1706 0.1526 0.1220 0.1325 0.1233 
(0.012) (0.013) (0.008) (0.031) (0.016) 
90% 0.2123 0.1960 0.1567 0.1756 0.1569 
(0.016) (0.016) (0.010) (0.041) (0.021) 
40 80% 0.1459 0.1271 0.1007 0.1092 0.1009 
(0.017) (0.017) (0.013) (0.028) (0.018) 
90% 0.1820 0.1631 0.1293 0.1447 0.1283 
(0.021) (0.023) (0.017) (0.037) (0.023) 
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TABLE 19 


LENG IG oS al dO icles Ps is CN p= 10.0 MODEL) 





Hine: I (% Yo) 


0.5 80% 0.2689 0.2262 O.1111 0.1746 0.1584 
(0.037) (0.045) (0.018) (0.065) (0.055) 

90% 0.3281 0.2891 0.1426 0.2274 0.2094 
(0.046) (0.060) (0.023) (0.085) (0.078) 

1.0 80% 0.3087 0.2702 0.1708 0.1913 0.1820 
(0.020) (0.023) (0.018) (0.050) (0.035) 

90% 0.3775 0.3469 0.2194 0.2492 0.2356 
(0.024) (0.030) (0.023) (0.065) (0.047) 

15 80% 0.3169 0.2792 0.1985 0.2165 0.2030 
(0.010) (0.011) (0.010) (0.051) (0.029) 

90% 0.3876 0.3585 0.2549 0.2820 0.2617 
(0.013) (0.014) (0.013) (0.067) (0.040) 

20 80% 0.3082 0.2698 0.2058 0.2244 0.2077 
(0.019) (0.021) (0.007) (0.049) (0.028) 

90% 0.3768 0.3463 0.2643 0.2923 0.2650 
(0.024) (0.027) (0.009) (0.064) (0.037) 

3.0 80%. Cease 0.2345 0.1886 2027 0.1954 
(0.032) (0.037) (0.022) (0.049) (0.034) 

90% 0.3365 0.3001 0.2422 0.2640 0.2338 
(0.040) (0.049) (0.029) (0.063 (0.044) 

4.0 80% 0,2377 0.1895 0.1554 0.1627 0.1485 
(0.045) (0.057) (0.032) (0.050) (0.040) 

90%  0,2896 0.2389 0.1992 2118 0.1860 
(0.055) (0.077) (0.042) (0.065) (0.049) 
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TABLE 21 
LENG oh = ee ioe > t} Eee oe MODEL) 


/3, Pte 
time C.1(%) BIN ° NOR MLE > LK B.T 
0.5 80% 0.2874 Oe 0.1374 0.3199 0.2984 
(0.030) (0.033) (0.021) (0.230) (0.158) 
90% 0.3508 0.3167 0.1765 0.4166 0.4154 
(0.037) (0.045) (0.027) (0.300) (0.238) 
1.0 80% 0.3151 027m 0.1926 0.2595 0.2520 
(0.012) (0.013) (0.014) (0.098) (0.077) 
90% 0.3854 0.3560 0.2473 0.3380 0.3355 
(0.015) (0.016) (0.018) (0.127) (0.121) 
15 80% 0.3037 0.2670 0.2046 0.2539 0.2387 
(0.022) (0.024) (0.011) (0.076) (0.056) 
90% 0.3736 0.3428 0.2627 0.3306 0.3084 
- (0,027) (0.031) (0.014) (6.099) (6.078) 
20 380% 0.2847 0.2440 0.2947 0.2338 0.2140 
(0.034) (0.038) (0. = (0.071) (0.051) 
90% 0.3405 0.3123 0. 0.3044 0.2730 
(0.042) (0.051) (0.028) (6.093) (0.066) 
3.0 80% 0.2334 0.1842 0.1515 0.1700 0.1542 
(0.047) (0.061) 0.036) (0.066) (0.054) 
90% 0.2842 0.2316 0.1940 0.2213 0.1936 
0.056) (0.081) (0.048) (0.086) (0.066) 
40 80% 0.1893 0.1253 0.1069 0.1117 0.1024 
(0.051) (0.075) (0.039) (0.057) (0.049) 
90% 0.2316 0.1536 0.1351 0.1455 0.1281 
(0.060) (0.096) (0.052) (0.074) (0.060) 
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TABLE <2 
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